Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
J Biomed Inform ; 142: 104386, 2023 06.
Article in English | MEDLINE | ID: covidwho-2316012

ABSTRACT

OBJECTIVE: With the onset of the Coronavirus Disease 2019 (COVID-19) pandemic, there has been a surge in the number of publicly available biomedical information sources, which makes it an increasingly challenging research goal to retrieve a relevant text to a topic of interest. In this paper, we propose a Contextual Query Expansion framework based on the clinical Domain knowledge (CQED) for formalizing an effective search over PubMed to retrieve relevant COVID-19 scholarly articles to a given information need. MATERIALS AND METHODS: For the sake of training and evaluation, we use the widely adopted TREC-COVID benchmark. Given a query, the proposed framework utilizes a contextual and a domain-specific neural language model to generate a set of candidate query expansion terms that enrich the original query. Moreover, the framework includes a multi-head attention mechanism that is trained alongside a learning-to-rank model for re-ranking the list of generated expansion candidate terms. The original query and the top-ranked expansion terms are posed to the PubMed search engine for retrieving relevant scholarly articles to an information need. The framework, CQED, can have four different variations, depending upon the learning path adopted for training and re-ranking the candidate expansion terms. RESULTS: The model drastically improves the search performance, when compared to the original query. The performance improvement in comparison to the original query, in terms of RECALL@1000 is 190.85% and in terms of NDCG@1000 is 343.55%. Additionally, the model outperforms all existing state-of-the-art baselines. In terms of P@10, the model that has been optimized based on Precision outperforms all baselines (0.7987). On the other hand, in terms of NDCG@10 (0.7986), MAP (0.3450) and bpref (0.4900), the CQED model that has been optimized based on an average of all retrieval measures outperforms all the baselines. CONCLUSION: The proposed model successfully expands queries posed to PubMed, and improves search performance, as compared to all existing baselines. A success/failure analysis shows that the model improved the search performance of each of the evaluated queries. Moreover, an ablation study depicted that if ranking of generated candidate terms is not conducted, the overall performance decreases. For future work, we would like to explore the application of the presented query expansion framework in conducting technology-assisted Systematic Literature Reviews (SLR).


Subject(s)
COVID-19 , Information Storage and Retrieval , Humans , PubMed , Search Engine , Semantics
2.
14th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022 ; : 444-453, 2022.
Article in English | Scopus | ID: covidwho-2290980

ABSTRACT

The drug abuse epidemic has been on the rise in the past few years, particularly after the start of COVID-19 pandemic. Our preliminary observations on Reddit alone show that discussions on drugs from 2018 to 2020 increased between a range of 45% to 200%, and so has the number of unique users participating in those discussions. Existing efforts focused on utilizing social media to distinguish potential drug abuse chats from unharmful chats regardless of what drug is being abused. Others focused on understanding the trends and causes of drug abuse from social media. To this end, we introduce PRISTINE (opioid crisis detection on reddit), our work dynamically detects-and extracts evolving misleading drug names from Reddit comments using reinforced Dynamic Query Expansion (DQE) and constructs a textual Graph Convolutional Network with the aid of powerful pre-trained embeddings to detect which type of drug class a Reddit comment corresponds to. Further, we perform extensive experiments to investigate the effectiveness of our model. © 2022 IEEE.

3.
2022 IEEE International Conference on Big Data, Big Data 2022 ; : 5698-5707, 2022.
Article in English | Scopus | ID: covidwho-2257758

ABSTRACT

The COVID-19 pandemic has caused hate speech on online social networks to become a growing issue in recent years, affecting millions. Our work aims to improve automatic hate speech detection to prevent escalation to hate crimes. The first c hallenge i n h ate s peech r esearch i s t hat e xisting datasets suffer from quite severe class imbalances. The second challenge is the sparsity of information in textual data. The third challenge is the difficulty i n b alancing t he t radeoff b etween utilizing semantic similarity and noisy network language. To combat these challenges, we establish a framework for automatic short text data augmentation by using a semi-supervised hybrid of Substitution Based Augmentation and Dynamic Query Expansion (DQE), which we refer to as SubDQE, to extract more data points from a specific c lass f rom T witter. W e a lso p ropose the HateNet model, which has two main components, a Graph Convolutional Network and a Weighted Drop-Edge. First, we propose a Graph Convolutional Network (GCN) classifier, using a graph constructed from the thresholded cosine similarities between tweet embeddings to provide new insights into how ideas are connected. Second, we propose a weighted Drop-Edge based stochastic regularization technique, which removes edges randomly based on weighted probabilities assigned by the semantic similarities between Tweets. Using 3 different SubDQE-augmented datasets, we compare our HateNet model using eight different tweet embedding methods, six other baseline classification models, and seven other baseline data augmentation techniques previously used in the realm of hate speech detection. Our results show that our proposed HateNet model matches or exceeds the performance of the baseline models, as indicated by the accuracy and F1 score. © 2022 IEEE.

4.
Concurrency and Computation ; 35(3), 2023.
Article in English | ProQuest Central | ID: covidwho-2235875

ABSTRACT

Due to the technical words employed, which are primarily recognized by medical specialists, information retrieval in the medical area is sometimes described as sophisticated. Because of this, users frequently have trouble coming up with queries utilizing these medical phrases. However, this problem may be readily fixed by an information retrieval system that finds the pertinent terms that fit the user's query and automatically creates a ranking document using these keywords. To enhance the IR performance, the Automatic Query expansion method is applied by appending additional query terms for the medical domain. We propose a novel fuzzy‐based Grasshopper Optimization Algorithm (GOA) based on automatic query expansion. This work is mainly focused on filtering the most relevant augmented query by utilizing the synchronization score of IR evidence like normalized term frequency, inverse document frequency, and normalization of document length. The main aim of this work is to identify the medical terms that appropriately match the user's queries. The GOA algorithm ranks the terms based on relevance and then identifies the terms with the maximum synchronization value. The documents formed using the optimal expanded query are classified into three types, namely totally relevant, moderately relevant, and marginally relevant. Besides, the comparison of the proposed work is carried out for different performance metrics like Mean‐Average Precision, F‐measure, Precision‐recall, and Precision rank are evaluated and analyzed by using TREC‐COVID, TREC Genomics 2007, and MEDLARs medical datasets for the proposed and some of the state‐of‐art works. For a total of 60 queries, the proposed model offers an F1‐Score of 0.964, 0.959, and 0.968 for the MEDLARS, TREC Genomics, and TREC COVID19 datasets, respectively. The E1‐score and Mean Reciprocal Rate (MRR) of the proposed model is 0.8 and 0.9 when evaluated using the TREC COVID19 dataset. Performance analyses show that the proposed approach outperforms the other automatic keyword expansion approaches in the medical domain.

5.
1st International Conference on Ambient Intelligence in Health Care, ICAIHC 2021 ; 317:117-132, 2023.
Article in English | Scopus | ID: covidwho-2173918

ABSTRACT

Retrieving relevant information covering different aspects of user information needs and ranking them based on their diverse nature are some of the important problems in the information retrieval domain. Identifying a document content covering multiple aspects of information pertaining to a query is of interest to users who wish to see everything about the query. The specific portions (information nuggets) of such documents may talk about specific aspects, and similar aspects of information can be seen across top k retrieved documents. We have proposed an information retrieval framework using the fine-tuned BERT model that identifies such aspects across top k documents and identifies such aspect based information in the form of information nuggets. Similar information nuggets are clustered based on their contextual relevance to specific aspects of the query. This work also applies topic-assisted query expansion to prune the final retrieved set of information nuggets, and the final retrieved set of information covers diverse aspects of user information needs. The experiment results done on three dataset, including COVID-19 dataset, shows that the proposed topic-assisted fine-tuned BERT model shows a better performance in comparison with the standard Vector Space Model. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

6.
Concurrency and Computation-Practice & Experience ; 2022.
Article in English | Web of Science | ID: covidwho-2172773

ABSTRACT

Due to the technical words employed, which are primarily recognized by medical specialists, information retrieval in the medical area is sometimes described as sophisticated. Because of this, users frequently have trouble coming up with queries utilizing these medical phrases. However, this problem may be readily fixed by an information retrieval system that finds the pertinent terms that fit the user's query and automatically creates a ranking document using these keywords. To enhance the IR performance, the Automatic Query expansion method is applied by appending additional query terms for the medical domain. We propose a novel fuzzy-based Grasshopper Optimization Algorithm (GOA) based on automatic query expansion. This work is mainly focused on filtering the most relevant augmented query by utilizing the synchronization score of IR evidence like normalized term frequency, inverse document frequency, and normalization of document length. The main aim of this work is to identify the medical terms that appropriately match the user's queries. The GOA algorithm ranks the terms based on relevance and then identifies the terms with the maximum synchronization value. The documents formed using the optimal expanded query are classified into three types, namely totally relevant, moderately relevant, and marginally relevant. Besides, the comparison of the proposed work is carried out for different performance metrics like Mean-Average Precision, F-measure, Precision-recall, and Precision rank are evaluated and analyzed by using TREC-COVID, TREC Genomics 2007, and MEDLARs medical datasets for the proposed and some of the state-of-art works. For a total of 60 queries, the proposed model offers an F1-Score of 0.964, 0.959, and 0.968 for the MEDLARS, TREC Genomics, and TREC COVID19 datasets, respectively. The E1-score and Mean Reciprocal Rate (MRR) of the proposed model is 0.8 and 0.9 when evaluated using the TREC COVID19 dataset. Performance analyses show that the proposed approach outperforms the other automatic keyword expansion approaches in the medical domain.

7.
Measurement ; : 111300, 2022.
Article in English | ScienceDirect | ID: covidwho-1851747

ABSTRACT

Several factors hinder information retrieval in the medical profession. Consumers (layman people) often struggle to learn medical terms. Because medical terms are more evident to professionals, it is difficult for consumers to construct a query using medical terms. Consumers would find it easier to access relevant medical information if medical words relevant to their query were automatically added. Various kiosksuse approaches using machine vision toform the user queries and monitor their health. This work proposes a hybrid approach to term selection by expanding user queries with medical terms relevant to the medical query. The selection of terms is based on fuzzy similarity reasoning based on two primary term selection strategies.WordNet semantic filtering is applied for preventing query drift, followed by the calculation of BERTScore. The retrieved documents were ranked using Okapi-BM25. For evaluation purposes, six benchmark datasets have been used: CACM, CISI, MEDLINE, ADI, FIRE, and TREC (Covid-19). The results indicate that the suggested technique outperforms the current state-of-the-art.

8.
Med Biol Eng Comput ; 59(10): 1993-2017, 2021 Oct.
Article in English | MEDLINE | ID: covidwho-1427400

ABSTRACT

Emerging medical imaging applications in healthcare, the number and volume of medical images is growing dramatically. Information needs of users in such circumstances, either for clinical or research activities, make the role of powerful medical image search engines more significant. In this paper, a text-based multi-dimensional medical image indexing technique is proposed in which correlation of the features-usages (according to the user's queries) is considered to provide an off-the content indexing while taking users' interestingness into account. Assuming that each medical image has some extracted features (e.g., based on the DICOM standard), correlations of the features are discovered by performing data mining techniques (i.e., quantitative association pattern discovery), on the history of users' queries as a data set. Then, based on the pairwise correlation of the features of medical images (a.k.a. Affinity), set of the all features is fragmented into subsets (using method like the vertical fragmentation of the tables in distribution of relational DBs). After that, each of these subsets of the features turn into a hierarchy of the features (by applying a hierarchical clustering algorithm on that subset), subsequently all of these distinct hierarchies together make a multi-dimensional structure of the features of medical images, which is in fact the proposed text-based (feature-based) multi-dimensional index structure. Constructing and using such text-based multi-dimensional index structure via its specific required operations, medical image retrieval process would be improved in the underlying medical image search engine. Generally, an indexing technique is to provide a logical representation of documents in order to optimize the retrieval process. The proposed indexing technique is designed such that can improve retrieval of medical images in a medical image search engine in terms of its effectiveness and efficiency. Considering correlation of the features of the image would semantically improve precision (effectiveness) of the retrieval process, while traversing them through the hierarchy in one dimension would try to optimize (i.e., minimize) the resources to have a better efficiency. The proposed text-based multi-dimensional indexing technique is implemented using the open source search engine Lucene, and compared with the built-in indexing technique available in the Lucene search engine, and also with the Terrier platform (available for the benchmarking of information retrieval systems) and other the most related indexing techniques. Evaluation results of memory usage and time complexity analysis, beside the experimental evaluations efficiency and effectiveness measures show that the proposed multi-dimensional indexing technique significantly improves both efficiency and effectiveness for a medical image search engine.


Subject(s)
Algorithms , Data Mining , Diagnostic Imaging
SELECTION OF CITATIONS
SEARCH DETAIL